SRC  TR  85-29 


Convergence  of  Implicit 
Discretization  Schemes  for  Linear 
Differential  Equations  with 
Application  to  Filtering 


By 


M.  Piccioni 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1.  REPORT  DATE 

1985 

2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-1985  to  00-00-1985 

4.  TITLE  AND  SUBTITLE 

5a.  CONTRACT  NUMBER 

Convergence  of  Implicit  Discretization  Schemes  for  Linear  Differential 

5b.  GRANT  NUMBER 

jLquctiiuns  wiiii  /ippiitauun  iu  riucimg 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

University  of  Maryland, Electrical  Engineering  Department, College 

Park, MD, 20742 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

see  report 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 
ABSTRACT 

18.  NUMBER 
OF  PAGES 

24 

19a.  NAME  OF 
RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


CONVERGENCE  OF  IMPLICIT  DISCRETIZATION  SCHEMES  FOR  LINEAR 
DIFFERENTIAL  EQUATIONS  WITH  APPLICATION  TO  FILTERING 


by 

: k 

M.  Piccioni 

Department  of  Electrical  Engineering 
University  of  Maryland 
College  Park,  MD  20742 


& 

On  leave  for:  Dipartimento  di  Matematica,  II  Universita  di  Roma,  Via  Orazio 
Raimondo,  00173  Roma,  Italy. 

This  work  was  supported  partially  through  ONR  Grant  N00014-84-K-0614,  partially 
through  a  grant  from  the  Minta  Martin  Aeronautical  Research  Fund,  College  of 
Engineering,  University  of  Maryland  at  College  Park,  and  partially  through  Grant  No. 
203.01.36  of  CNR,  Italy. 


1.  INTRODUCTION 


The  motivation  for  the  present  work  arises  from  the  following  well-known  problem 
in  nonlinear  filtering.  Let  (Xfc)  be  a  R^-valued  diffusion  process  with  generator  A 
and  let  (W^)  be  an  independent  Rm-valued  standard  Brownian  motion,  both  defined  on 
(Jl~; p).  Let 
t 

Y  =  J  g(X  )ds  +  W  ,  t>0,  (1.1) 

t  o  s  c 

d  in  _ 

where  g:R  ^R  .  Compute  recursively  the  conditional  expectations  E(f(X  ) |Y  ,  0<s<t) 

for  some  "sufficiently  large"  class  of  functions  f  defined  on  R^.  Boundedness  and 

smoothness  assumptions  on  the  coefficients  of  A  will  be  given  later.  We  assume  from 

oo  d 

now  on  that  bL  (R  ),  i=l,...,m. 

A  convenient  representation  for  the  desired  conditional  expectations  is  given  by 
the  Kallianpur-Striebel  formula  [.11  J.  Define  on  another  probability  space  (U,F,P)  a 
diffusion  (Xt)  with  the  same  distribution  of  (X^).  Then 

ELf(X .  )exp(JtgT(X  )dY  -  j  j j g ( X  )  [2ds)J 

t  n  S  S  X  „  s 

E(f (X  )  |Y  ,  0*s<t)  - - ,  (1.2) 

t  S  t  t 

E  LexP(  J  gT(X  )dY  -  ~  J  ]g(X  )  I  2ds)  J 
o  s  s  z  0  s 

which  reduces  the  problem  to  the  computation  of  integrals  on  the  paths  of  (X^)  (the 
stochastic  integral,  for  each  path  of  (X^),  is  a  Wiener  integral  computed  on  the 
given  path  of  (Yt)).  By  differentiating  the  numerator  a  weak  stochastic  partial  dif¬ 
ferential  equation  is  obtained  for  a  multiple  of  the  conditional  probability  measure 
q^,  usually  called  the  Zakai  equation  [31 J 

&  T 

dq  =  A  q  dt  +  q^  g  dY^_  .  (1.3) 

Of  course,  if  we  want  to  solve  (1.3)  recursively  "on  line"  on  a  digital  computer 
the  best  we  can  do  is  to  provide  well-behaved  discretization  algorithms  in  both  space 
a  nd  t ime . 
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It  is  useful  to  design  algorithms  which  discretize  (1.3)  but  still  retain  the 
representation  (1.2),  merely  changing  the  process  (Xfc)  involved.  This  could  be 
obtained  by  replacing  the  diffusion  by  a  continuous-time  finite-state  Markov  chain 
with  generator  A^,  which  of  course  is  of  finite-difference  type  (the  simplest  example 
is  provided  by  Kushner  1.17  J).  But  it  continues  to  hold  if  time  is  implicitly  discre¬ 
tized  and  the  stochastic  Trotter  product  formula  1.5, 2 j  is  used,  thereby  obtaining  the 
equation 


(I  AAh^q(k+l)A  exP(§  ^Y(k+l)AYkA^  2 lg  I  A^qkA  ’  k 


(1.4) 


This  scheme  has  been  first  obtained  by  Clark  [5],  by  discretizing  implicitly 

time  in  the  "robust"  version  of  (1.3).  For  us  it  is  more  interesting  to  know  that 

the  solution  of  (1.4)  can  be  written  essentially  as  in  (1.2)  replacing  (X^_)  by  a 

A  t  -1 

discrete-time  Markov  chain  (X^  )  with  transition  matrix  (I-AA^)  [ 20 j .  This  rela¬ 

tes  the  convergence  of  the  approximation  scheme  (1.4)  to  the  weak  convergence  of 
(x£;A)  to  (Xt)  when  h, A^O  (h  is  thought  as  a  mesh  parameter  of  the  space  discretiza¬ 
tion  grid).  But  this  is  known  to  be  ensured  by  the  convergence  of  the  discrete 
semigroup  described  by  the  free  behaviour  of  equation  (1.4)  (i.e.,  when  Y=0)  to  the 

semigroup  generated  by  A  [.14  J.  The  relevant  point  is  that  we  would  like  to  establish 

convergence  for  h  and  A  going  to  zero  independently.  Our  main  Theorem  2.4,  given  in 

the  following  section,  gives  sufficient  conditions  for  this  only  in  term  of  A.  and  A. 

h 

This  is  reasonable,  in  that  the  matrices  A^  are  the  parameter  of  the  scheme  (1.4)  and 

they  have  to  be  chosen  with  the  best  possible  band  structure,  so  as  to  solve  as 

quickly  as  possible  the  equation  (1.4),  without  inverting  I-A  A^  [lj.  It  turns  out 
that  these  conditions  are  slightly  stronger  than  those  given  by  the  Trotter-Kato- 
Kurtz  theorem  [27,  11,  13 j  for  the  convergence  of  the  semigroup  generated  by  A^  to 
that  generated  by  A,  therefore  confirming  in  an  abstract  setting  that  the  implicit 
discretization  of  time  allows  an  independent  choice  of  discretization  steps  in  space 
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and  time,  respectively  [24],  Thus  Theorem  2.4  could  be  of  some  interest  indepen¬ 
dently  of  the  filtering  problem;  of  course  it  can  be  successfully  applied  to  show 
convergence  for  different  approximate  filters  which  usually  do  not  retain  any  proba¬ 
bilistic  meaning,  like  those  built  by  Galerkin  methods  [7]. 

In  Section  3  we  review  the  already  cited  results  connecting  convergence  of 
Markov  (Feller-Dynkin)  semigroups  with  weak  convergence  of  their  sample  paths.  With 

Theorem  2.4  and  this  type  of  results,  in  Section  4  we  obtain  convergence  results  for 

tl  A 

the  functionals  involved  in  (1.2),  computed  averaging  on  the  paths  of  (X^  ).  For 
this  it  is  useful  to  obtain  for  those  functionals  a  Lipschitz  condition  in  Y,  indepen¬ 
dently  of  (h,A),  thereby  extending  previous  results  for  the  Kushner  space  discretiza¬ 
tion  scheme  [20 j.  This  is  done  by  the  arguments  used  in  [ 17 ] ,  that  is  integration  by 
parts  in  (1.2)  and  some  martingale  estimates  (which  require  conditions  on  g,  too). 

In  Section  5  this  result  is  shown  to  imply  the  robustness  of  the  approximate  filters 
(1.4),  in  that  if  they  are  forced  by 
t 


\  -  I  g(X^)ds  +  W®,  t>0 

L  ^  b  L 


(1.5) 


— «-C  0  -  n — r 

where  (Xt  ,  )  converges  weakly,  as  e-K),  to  (X^. ,  Wfc),  nonetheless  the  joint  distribu- 

— 0 

tion  of  (Xfc)  and  the  (h, A)-approximate  filter  computed  on  the  paths  of  (Y  )  converges 

weakly,  as  (h,A,e)->0,  to  the  "ideal"  one  given  by  (1.2)  and  (2.2). 

The  final  section  deals  with  two  short  examples.  The  first  is  Kushner' s  scheme, 
the  other  one  a  variation  of  that  which  is  intended  to  show  that  different  choices 
are  possible,  depending  on  the  particular  structure  of  the  diffusion  model.  The  suf¬ 
ficient  conditions  are  easily  checked  in  any  case,  but  to  avoid  cumbersome  notations 
we  limit  ourselves  to  the  case  d=l.  However,  when  the  dimension  of  the  state  space 
increases,  the  reasonable  choices  for  the  approximating  chains  increase,  surely 
influencing  the  speed  of  computation.  Much  more  work  remains  to  be  done  on  these 
issues.  Anyway  boundedness  conditions  on  the  coefficient  of  the  diffusion  are 
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needed,  because  the  state  space  is  not  compact.  This  suggests  that  the  convergence 
and  robustness  results  obtained,  which  holds  in  general  for  continuous  Feller-Dynkin 
processes  on  locally  compact  state  spaces,  will  be  more  meaningful  for  nondegenerate 
diffusions  on  compact  Riemannian  manifolds. 
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2.  THE  ABSTRACT  CONVERGENCE  THEOREM 

First  of  all  we  recall  the  Hille-Yosida  theorem.  Let  L  by  any  Banach  space. 
THEOREM  2.1.  A  linear  operator  A  on  L  is  the  infinitesimal  generator  of  a  (strongly 
continuous)  semigroup  of  operators  on  L  if  and  only  if:  D(A)  is  dense  in  L  and,  for 
A>0  I-AA  is  invertible  on  the  whole  L,  with  the  inverse  which  is  a  contraction. 


We  remark  that  Hille’s  proof  [9 j  is  just  based  on  the  convergence  of  the  impli¬ 
cit  time-discretization  scheme  (governed  by  jj^^)  to  the  generalized  solution  of 
the  corresponding  Cauchy  problem  with  problem  with  operator  A,  when  A  goes  to  zero. 
Yosida  [30 J  approximates  this  solution  with  the  exponentials  of  the  bounded  operators 
A^  =  A  *(J  -I).  Of  course,  because  A  generates  a  unique  semigroup  (which  will  be 
called  {e^t})  the  contraction-valued  functions  of  time  and  e^A1"  have  the  same 

asymptotic  behaviour  as  A-H).  We  will  utilize  a  generalization  of  this  result,  due  to 
Kurtz  |_  1 3 J ,  to  prove  Theorem  2.4  by  using  the  more  convenient  Yosida-type  argument. 

Let  us  put  ourselves  in  the  setting  of  [ 1 3 J  which  allows  to  consider  convergence 
of  Markov  processes  defined  on  different  state  spaces.  Suppose  that  for  each  h>0, 

L^  is  a  Banach  space  and  there  exists  a  bounded  map  P^iL^L^  such  that  for  each  f£L 

1  im  HP,  f  ll  =  ilfn,  which  in  turns  imples  that  «P  ,»<M  for  some  M>0.  On  each  L,  an  infi- 

,  _  h  h  h 

h+0 

nitesimal  generator  A^  of  some  contraction  semigroup  is  specified.  For  numerical 
applications  L^  will  be  always  finite-dimensional  so  that  A^  will  be  bounded,  but 
this  is  not  assumed  now. 


h  At 

We  say  that  the  family  of  semigroups  {e  },  h>0,  converges  to  [e  }  as  h-^0  if, 
for  any  f  £L  and  T>0 

A,  t 

lim  sup  «e  P,  f  -  P,  e  f  ii  =  0  (2.1) 

h+0  te[0,Tj  h 


Conditions  for  (2.1)  to  hold  are  given  by  the  Trotter-Kato-  Kurtz  theorem  which 
is  reported  below. 
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THEOREM  2.2.  The  following  are  equivalent: 


i i)  for  each  f  eL 

limii(I-A,  )_1P,  f  -  P  (I-A)_1f  li-0  (2.2) 

h+o  h  h  h 

iii)  for  each  f  in  a  core  S  of  A  there  exists  f  eD(A  )  such  that 


1  im 
h+0 


"fh- 


V 


iiA,  f, 
h  h 


P,  Af  ii  =  0 
n 


(2.3) 


We  recall  that  a  core  S  of  the  generator  A  is  a  linear  manifold  included  in  D(A) 

such  that  A  is  the  closed  extension  of  A  j g.  The  most  used  cores  are  linear  manifolds 

At 

dense  in  L  which  are  invariant  under  e  ,  for  t>0. 

Theorem  2.2  will  be  a  basic  tool  in  the  sequel,  in  that  both  conditions  ii)  and 
iii)  will  be  used  to  prove  Theorem  2.4.  The  fact  that  ii)  +  i)  is  due  to  Trotter  [ 27 j , 
whereas  the  converse  to  Kato  [llj.  Condition  iii)  have  been  introduced  by  Kurtz 
[13]. 


If  an  implicit  discretization  scheme  is  applied  to  the  evolution  equation 
governed  by  A^>  the  discrete  contraction  semigroup  A  =  (I-AA^)  is  obtained  on 
the  space  L^.  Our  objective  is  to  find  sufficient  conditions  under  which 

converges  to  |e^t  }  in  the  same  sense  of  (2.1),  as  (h,A)->-0.  As  mentioned  before 


we  consider  continuous-time  semigroups  "asymptotically  equivalent”  to  by 

of  the  following  estimate  [l3j.  Define  the  bounded  operator  on  L, 


means 


Ah,i  ‘  ‘"‘(-V-1) 


(2.4) 


which  is  easily  shown  to  generate  a  contraction  semigroup. 


THEOREM  2.3.  For  any  feL,  t>0  and  e>0 


i  A  t 

"^hA  "  e  h’A  >  phf  11  <  2tllAh  Aphf  11  A  (^f“"phfil  +  (et+A)ilAh  Ap  f  ii).  (2.5) 

’  ’  £  t  ’ 
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We  are  now  ready  to  prove  the  promised  convergence  theorem. 


THEOREM  2.4.  Let  us  suppose  that  for  any  f  in  a  core  S  of  A,  P^feD(A^)  (at  least  for 
h  sufficiently  small)  and 


lim 

h+O 


“Vhf 


P  Af  H  =  0 
h 


(2.6) 


Then,  for  any  f eL  and  T>0 

lim  sup  V  f 

(h,A)+0  teLO.Tj 


P  eAtf  II  =  0 
h 


(2.7) 


Proof.  By  Theorem  2.2  it  is  enough  to  prove  that  for  any  feL 


lim  H(I-A,  ..rVf  -  P,(I-A)  lf  II  =  0. 
,,  . .  „  h,  a  h  h 

(h,A)-M) 


(2.8) 


Being  UP  II <M  and  U(I-A,  A)  1 II <1 ,  it  is  enough  to  prove  (2.8)  for  feS.  Note  that 


h,  A 


it  is  possible  to  write 


V  -  Vr-AV 


-1 


AhJh , A 


so  that,  for  any  g£D(A  ) 


(I-Ah  A)  Xg  =  (I-A^I-AA^-1)"^  =  [(I-(A+l)Ah)(I-AAh) 


"  Cl-(^DAh)_1(I-Vg  =  JA+l,h(g-AAhg) 


(2.9) 


Therefore,  if  f £S 

<  “w.hW" + 

+  ll'^A+l,h^h£~^h,JP  *  <  “W  +  UA+l,hV  -  W  (2-10) 

By  (2.6)  it  is  clear  that  for  each  f eS  there  exists  K>0  such  that,  for  h>0 


7 


(2.11) 


"A,  P,  f  "^K 
h  h 

so  that  the  first  term  in  the  r.h.s.  of  (2.10)  goes  to  zero  as  (h,A)->-0.  The  same 
assumption  implies  that,  for  f  £S 


lim 

(h,A)+0 


«(/H-l)AhPhf  -  PhAf  a  =  0 


(2.12) 


and  by  Theorem  2.2  the  norm  of 


(I-(4+l)Ah>'V  -  -  Vl.hV  -  W 


goes  to  zero  as  (h,A)-K).  This  proves  that  for  any  f£L,  T>0 


lim  sup  »e  ’  P  f  -  P,e f  b  =  0  (2.13) 

(h,A)+0  t  £ [0 , T  J 


To  get  (2.7)  it  is  enough  again  to  consider  f£S  in  (2.5). 
np^u<Mand,  by  (2.9)  and  (2,11),  that 

'•A,  ,P,f»  =  bj.  AA.  P.  f  «  <  "A,  P,  f  II  <  K,  f£S, 
h,Ah  h,Ahh  hh  ’  * 


Use  the  fact  that 


(2.14) 


to  show  that  the  r.h.s.  of  (2.7)  is  uniformly  bounded  with  respect  to  h  and  can  be 

made  uniformly  small  for  te[0,Tj  with  an  appropriate  choice  of  £  and  taking  A  suf- 

,  Ah,AC 

ficiently  small.  So  the  fact  that  |e  J  has  the  same  asymptotic  behaviour  as 

^ }  (as  both  h  and  A  goes  to  zero)  is  obtained  and  (2.7)  is  finally  established. 


The  condition  (2.6)  is  slightly  stronger  than  the  mere  convergence  of  the 
Aht  At 

semigroups  |e  }  to  |e  \  in  that  in  (2.3)  the  particular  choice  f^  =  P  f  is  made. 
But  it  is  interesting  that  this  condition  involves  only  A^;  it  does  not  require  to 
compute  that  is  to  solve  (I-AA^)f  =  g  for  all  possible  g£L^,  instead  that  for 

one  g  at  a  time. 
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3.  FELLER-DYNKIN  SEMIGROUPS  AND  WEAK  CONVERGENCE 

Let  us  consider  a  particular  class  of  Markov  semigroups.  Let  E  and  E^  be 
complete,  separable,  locally  compact  spaces  and  be  a  measurable  map  from  to  E, 
for  h>0.  Let  (Xt)  and  (Z^) ,  h>0  be  Markov  processes  with  respect  to  their  own  fami¬ 
lies  of  o-algebras,  possibly  defined  on  different  probability  spaces.  We  suppose 
that  the  corresponding  algebraic  semigroups  of  operators  [T  j  and  \T  }  on  L  (E^)  and 

co  .  .  A  A 

L  (E),  respectively,  are  Feller-Dynkin  [6J.  If  we  denote  by  C(E^)  (C(E))  the  Banach 
space  of  continuous  functions  on  E^(E),  which  go  to  zero  at  infinity  (when  the  one- 
point  compactif ication  is  done),  this  means  that  these  semigroups  are  strongly  con- 

A  A 

tinuous  on  C(E^)(C(E)).  Let  a  continuous  map  of  E^  into  E  (if  E^  is  not  compact 
has  to  map  the  infinity  of  E^  into  the  infinity  of  E,  but  this  is  not  usually  the 
case).  Those  maps  induce  corresponding  bounded  linear  transformations  of  C(E)  into 
C(Eh)  by 

(Phf)(x)  =  f(hh(x)),  x£Eh  (3.1) 

h  Vi 

Finally  define  Xt  =  n^(Zt),  h>0  and  observe  that  this  processes  have  versions  with 
sample  path  in  the  Skorohod  space  d[0,°°;EJ  of  E-valued  cadlag  functions  [ 29 J .  For 
metrics  on  this  space  we  refer  to  [21,15 J;  a  particular  case  will  be  discussed  in  the 
next  section. 

It  is  quite  clear  that  the  convergence  of  {T^1}  to  { T^. }  as  h^O  in  the  sense  of 
(2.1)  relates  the  expectation  of  functionals  of  the  corresponding  processes  at  each 
instant  of  time.  The  following  important  theorem,  due  to  Kurtz  [l4]  involves  the 
whole  sample  paths. 

THEOREM  3.1.  If  {t^}  converges  to  (Tt }  and  XQ  converges  weakly  to  XQ,  then  (Xfc)  con¬ 
verges  weakly  to  (X^),  as  h^O,  as  a  D  [0,  °°;E  J-valued  random  variable. 

The  previous  theorem  refers  only  to  continuous-time  Markov  processes,  but  it  can 
be  easily  extended.  The  argument  used  in  Section  2  has  in  fact  a  stochastic 
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interpretation.  In  fact,  let  A,  be  the  infinitesimal  generator  of  {t  }  on  C(E  )  and 


Q,  its  transition  function:  then,  for  A>0,  feC(E,  ) 
n  h 

00  —  i 

(I-AAj^)  Xf ( x)  =  /  f(y)  [A  1  /  e  A  tQh(t,x,dy)dt] , 


(3.2) 


which  shows  that  a  discrete-time  Markov  process  (Z^  >  k=0,l,...)  can  be  built  such 


that  for  each  feC(E.) 

n 


e'£(zcmi)a>IzmA  -  -  u-iy-fc*)  -  JhiA«*). 


(3.3) 


Now  it  is  quite  easy  to  show  that  ^  defined  in  (2.4)  is  the  infinitesimal 

A 

generator  of  a  Feller-Dynkin  process  (2.^  ’  ),  which  can  be  obtained  with  a  random  time 

h  A 

change  which  turns  the  intervals  between  the  jumps  of  (Z  *  )  to  be  i  exponential 

variables  with  mean  A.  Moreover  the  distance  between  the  processes  ’  =  ^(Z^’  ) 

and  =  hh(Z^’A aJa)  In  the  D [0, °°;E ]-metric  goes  to  zero  in  probability  as  (h,A) 

goes  to  zero  [ 1 5 J ■  This  allows  to  modify  the  previous  theorem  in  the  following  way. 


THEOREM  3.2.  If  {t^}  converges  to  {Tfc }  and  Xg,A  converges  weakly  to  X^,  then  (Xj^/AJA) 
converges  weakly  to  (X^),  as  a  D  [o,  °°;E  ]~valued  random  variable. 
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4.  APPLICATION  TO  FILTERING 


We  return  to  the  problem  stated  in  Introduction  by  identifying  the  "copy"  of 
the  state  process  in  (1.2)  with  the  Feller-Dynkin  one  of  the  last  section.  Moreover 
we  have  now  to  suppose  that  this  process  is  continuous  (so  that,  at  least  locally,  it 
is  a  diffusion  1.8  J).  We  need  to  introduce  an  extension  of  its  infinitesimal  genera¬ 
tor,  called  the  full  generator  L 1 5  J ,  which  is  a  possibly  multivalued  operator 

00  03 

ACL  (E)xL  (E)  such  that 
~  t 

(g,h)uA  g(X  )  -  j  h(X  )ds 
0 

is  a  martingale  w.r.t.  the  increasing  family  of  o-algebras  generated  by  (X^). 

Suppose  now  that  each  component  of  g  in  (1.1)  is  bounded,  uniformly  continuous 
and  belongs  to  D(A),  and  the  products  g^g.. ,  i,j  =  l,...,m,  too.  We  let  Ag  = 

(Agj ,  . . .  ,Ag^)  where  Ag^^  stands  for  any  element  of  the  A-image  of  ,  i  =  l,...,m.  By 
integrating  by  parts  inside  the  expectations  of  the  Kallianpur-Striebel  formula  (1.2) 
this  can  be  expressed  for  any  path  y  [0,co;Rm  J  (continuous  functions  which  starts 
from  zero)  of  (Y^)  through  the  "robust"  version  [5_| 


E  |f(Xt)exp(yT(t)g(Xt)  -  J  yT(s)dg(Xg)  J  |g(Xg)]2ds)J 

I(f(X ,)|Y  =y(s),  0^s<t)  - - - - - - - 

Z  t  t 

E Lexp(yT(t)g(X  )  -  J  yT(s)dg(X  )  -  \  j  |g(X)|2ds)j 
C  0  s  Z  0  8 

(4.1) 

for  any  f  bounded  and  measurable. 

In  fact,  by  assumption  g(X^)  is  a  Revalued  semi -martingale  whose  decomposition 


is  given  by 


t  „ 

g(Xt)  =  g(XQ)  +  j  (Ag) (X  )ds  +  M 


(4.2) 


where  M  is  a  continuous  square-integrable  martingale  having  the  matrix-valued 
increasing  process  [.  1 0  J 
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(4.3) 


=  /  L(Agig.)(Xs)  -  g.(Xs)(Agi)(Xs)  -  gl(Xs)(Agj)(Xs)JdS 


t  T  1 

Being  <M>  locally  bounded,  R  =  J  y  (s)dM  and  exp(R  -  -s-  <R>  )  are  martingales 
t  0  s 

[23 J,  from  which  the  boundedness  of  the  denominator  is  easily  obtained  for  each 
y  eCg  [0,°°;Rm  J .  By  Riesz  theorem  this  implies  that  for  each  y  there  exist  finite 
measures  Mt(y),  t>0,  such  that 


_  ....  <f.^y(y)> 

E(f(Xt)  |Ys=y(s),  0<s<t)  =  -—-tfjy  »  t>0 


(4.4) 


where  <f,y>  =  /  f(x)y(dx)  and 
E 


y  (y)(dx) 

11  (y)(dx)  - - 

/  u„(y)(dz) 


(4.5) 


is  a  regular  conditional  probability  measure. 

Now  let  us  suppose  that,  for  h>0,  A^  is  the  infinitesimal  generator  of  a 

continuous-time  Markov  chain  with  finite  state  space  E^  =  { 1 , 2, . . . ,N^| .  Let 

Vi 

associate  with  each  state  i  a  point  x^  in  E  and  for  each  function  on  E  let  P^f  be 

h 

the  N  -vector  of  the  evaluations  of  f  at  points  {x.,...,x  },  which  is  always  con- 

h  i  % 

s idered  with  the  sup  norm.  We  are  allowed  to  identify  E^  with  h^(E^)  and  the  values 
of  P^f  with  those  of  f  in  the  sequel.  For  A>0  and  yeCp [0,°°;Rm J  let  us  consider  the 
implicit  time-discretization  equation  (1.4),  which  can  be  rewritten  as 


h , A  _  tT  _ A  h, A 

q(k+l)  A  Jh,A  BkA  qkA  ’  k 


(4.6) 


B^^  being  a  N^-th  order  diagonal  matrix,  whose  i-th  diagonal  element  is 
exp(gT(xk)(y( (k+1) A)-y(kA))  -  i-  A|g(xk) |2)). 


Let  us  consider  on  some  probability  space  the  discrete-time  Markov  chain 
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k  £  ^  T  h 

(X^A  ,  k=0,l,...)  with  initial  probability  vector  (it  is  supposed  1  =  1)  and 


transition  matrix  J,  .. 

h ,  a 


THEOREM  4.1.  The  solution  of  equation  (4.6)  can  be  expressed  in  the  following  way: 


N, 


for  any  f  £R 
.T  h.A 


k_l 

f'q^  =  ELf(xJj;A)exp(  Z  LgT(X^A)(y(  (^+1)A)  -  y(*A))  -  ±  |g(xj;a)  |ZAj)  J .  (4.7) 


1  I2/ 


Proof.  It  requires  only  a  substitution  of  (4.7)  into  (4.6)  which  yields 


,T  T  a  h,A_  A  T  h,A_ 

f  Jh,ABkAqkA  (BkAJh,Af)  qkA 


=  ELE(f(X^ai)A)  |x^A)exp(  Z  {gT(X^A)(y((^+l)A) 


-  y(AA))  -  4-  |g(X^A)|2A])J 


X  h  ^  li  ^ 

which  is  equal  to  f  q^+ijA  by  the  Markov  property  of  (X^A  )  and  the  projective  pro¬ 
perty  of  conditional  expectations. 

It  can  be  easily  shown  that  (4.7)  gives  the  numerator  of  a  Kallianpur-Striebel 


type  formula  for  an  estimation  problem  in  discrete  time. 

h  A 

We  can  extend  the  function  q^A  to  continuous  time  by 

fTqI1,A(y)  =  ELf(X1J’ AAjA)exp(  J  gT(X^AA  jA)y(s)ds  -4  j  |g(X1j1’AA  |A)  |2ds)  J  (4.8) 


0 


’  [s/ A  j A 


2  J  i6V“Ls/AjA; 


if  y  is  in  CqL0,°°;EJ.  A  similar  expression  holds  in  this  case  for 


t  i  t  2 

<f,u(y)>=  E  [f  (X  )exp(  J  g(X  )y(s)ds  -  -y  J  jg(X  )  ]  ds)  J 


(4.9) 


Now  let  p^’ A(y)  =  q^ ’ A(y ) / ( l^q^ ’ A(y ) ) •  The  following  theorem  states  the  relevant 
consequence  of  Theorem  2.4  for  our  problem. 


THEOREM  4.2.  Let  us  suppose  that  the  convergence  condition  (2.6)  holds  and 
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h  i  „ 

pQ  converges  weakly  to  the  law  of  XQ,  as  h^O.  Then,  for  each  y£C  [0,“;R j,  T>0  and 
f  bounded  and  uniformly  continuous 


lim  sup  |fTp|!’A(y)  -  <f,JI  (y)>j  =  0 
(h,A)^o  t  e LO , T  j 


(4.10) 


Proof.  For  each  y  as  above  define  the  function  <j>j :  d[o,°°;E  J^d[0,°°  J  as 


C  T  1  C  7 

‘p](x)(t)  =  exp(  J  g  (x(s))y(s)ds  -  -j  j  jg(x(s))|  ds) 


(4.11) 


and  observe  that  for  each  T>0  there  exist  two  real  constants  K  and  K  such  that 


K  <  log  ^1(x)(t)  <  K, 


x £D [0, °°;E  j ,  t  e[0,T  J 


(4.12) 


Therefore,  if  <p^(x)(t)  =  f  (x(t )  )  <i^(x)  (t )  ,  then 


sup 

t  elO,T  J 


1  fTq^,A(y)  <f  ,Pt(y)> 
iTqh,A(y)  <i,Mt(y)> 


<  (  inf  lTq^’A(y))  1(  sup  |fTq^,A(y)  -  <f,u  (y)>|)  + 
te|0,Tj  t  e[0,T  J  c 


+  (  inf  <l,U(y)>  inf  lTq['’A(y))  *(  sup  |<f,P  (y)>|  sup  |lTq^’A(y)  -  <1 ,  M  (y)  >  | )  < 

t  eLo,T  J  t  e[0,T  J  t  e[0, T  J  te[o,Tj 


**  e~—  E  (  <pf  ( Xh  ’  A)  -  *f(X)Xx)  +e-2KeK,fiiE(Hl(Xh’A)  -  V*’ fc-  T> 


(4.13) 


where  "•  11  „  T  stands  for  the  sup  norm  on  Note  that  we  have  placed  (X^,A), 

h>0,n>0  and  (X  )  on  the  same  probability  space:  by  the  weak  convergence  assured  by 
Theorems  2.4  and  3.2  this  can  be  done  even  assuring  that  (X^’  (w))  converges  to 
(X^(w))  in  D[0,°°;eJ  for  each  u>  L4  J .  Being  the  paths  of  (Xt)  continuous  this  implies 
uniform  convergence  on  each  compact  set.  But  f  and  g  are  uniformly  continuous,  so 
that  the  two  terms  under  the  expectation  sign  in  (4.13)  converge  to  zero  for  each  u). 
By  bounded  convergence  theorem,  the  proof  is  accomplished. 


The  successive  step  will  be  to  extend  the  convergence  in  (4.10)  to  all  possible 
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This  allows  to  solve  the  following  "robustness"  problem.  Let  us  consider 

— — £  £ 

continuous  processes  (Xfc ,Wt),  e  >  0,  considered  as  c[0,“;Ej  x  C0[o,“;RmJ  -  valued 
random  variables,  which  are  a  family  of  "physical"  state  and  noise  models 
depending  on  some  parameter,  converging  to  the  "ideal"  diffusion  plus  white 
noise  model  of  the  Introduction  (xt>wt)  as  this  parameter  degenerates.  The 
typical  situations  to  have  in  mind  are  carefully  reviewed  in  [l6j.  Note  that 
the  output  map  defined  in  (1.1)  is  defined  on  each  sample  path  (X,W)  of  the 
state  and  noise  processes,  yielding  a  continuous  map 

y:  C  [0,  “;E  j  x  CQ  L0,  “>;Rm  J  +  CQ  [0,«;Rm  J  (5.6) 

with  all  the  spaces  endowed  with  the  metric  of  uniform  convergence  on  compact 
intervals . 

The  approximate  filter  (5.5)  is  applied  to  the  "physical"  output  process 

£  £  £ 

Y  =  y(X  , W  ).  The  following  result  extends  the  similar  one  proved  by  Kushner 
[19 J  for  one  particular  chain  in  continuous  time,  in  the  meantime  giving  a  more 
direct  proof  in  that  unnormalized  conditional  probabilities  are  not  used. 

— £  £  _ 

THEOREM  5.1.  Let  us  suppose  that  (Xt,Wt)  converges  weakly  to  (X^.W^)  as 
e  *  0,  where  (X^)  is  a  continuous  E-valued  Feller-Dynkin  process  and  (W  )  an 
independent  Revalued  standard  Brownian  motion.  Then,  under  the  hypotheses  of 
Corollary  4.1,  the  process  (X  ,W  ,p^’  (Y  ) )  converges  weakly  to  (T  ,W  ,  II  (Y))  as 

L  L  L  t  t  t 

(e,h,A)  -►  o,  considered  as  C[0,°°;Ej  x  C0[0,°°;RmJ  x  D[o,“;P(E)J  -  valued  random 
variables . 

Proof.  For  e  >  0,  h  >  0,  4  >  0,  define  the  functions 

X£,h,A:  C  [O,  °°;E  J  x  C0[o,“;RmJ  +  c|.0,»;eJ  x  Cq  Lo,“;Rm  j  x  DL0,“;  (E)J 

£,h,A.—  .  j —  h,A.  — 

x  (x , w)  =  (x , w , p  (y(x, w) ) )  (5.7) 
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and  X,  with  the  same  domain  and  range  space,  defined  by 

X(x,w)  =  (x,w,n(y(x,w) )).  (5.8) 

^^0  ^  0  i.  ^  ^ 

Let  X  ’  ’  =  X  and  W  *  ’  =  W  .  By  the  remarks  following  Corollary  4.1, 

0  h  A 

X  *  ’  converges  to  X  uniformly  on  compact  sets.  Being  X  continuous,  it 
suffices  to  to  apply  Theorem  5.5  in  [3]  to  get  the  desired  result.  /  / 

A  comprehensive  discussion  of  the  meaning  of  weak  convergence-type  results 
like  Theorem  5.1  is  given  in  [  1 4  j .  However,  again,  the  important  thing  is  to 
note  that  the  way  e,h,A  approach  zero  cannot  destroy  convergence.  We  believe 
that  those  results  could  be  of  particular  importance  for  sequential  decision 
problems  on  partially  observed  diffusions  [l]. 
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6.  TWO  EXAMPLES 


Let  us  first  consider  the  chain  proposed  by  Kushner,  which  is  obtained  by 
suitably  modifying  a  simple  difference  scheme  applied  to  the  generator  of  a  dif¬ 
fusion  [ibj.  We  limit  ourselves  to  the  one-dimensional  case  in  which  such  a 

scheme  always  works. 

Ic  /  k 

Let  C^(C  )  the  space  of  k-times  continuously  differentiable  function  on  R, 
with  all  those  derivatives  bounded  (which  go  to  zero  at  infinity).  Let  (X  )  be 
the  solution  of  the  martingale  problem  with  full  generator 


(Af)(x) 


+  b(x) 


3f 

3x 


(x) 


(6.1) 


where  it  must  be  supposed  that  a(x)  >  4  >  0  for  x£R  and  a(*)  and  b(*)  are  bounded  and 
Holder,  in  order  to  have  a  well-posed  martingale  problem  [28jand  the  restriction 
A  of  A  to  C^HC  to  be  extendible  to  a  generator  of  a  Feller-Dynkin  semigroup  on 

'ii  -2  2 

C  1.6  j.  But  we  need  also  to  use  G  as  a  core,  and  for  C  to  be  invariant  under 
At  2 

e  ,  a(’)  and  b(*)  have  to  be  also  in  .  In  this  case,  the  parabolic  equation 
( 3/ 3t  -  A)f  =  0  can  be  differentiated  twice  w.r.t.  the  space  variable  [ 25 J - 

For  each  h  >  0,  let  us  consider  a  finite  grid  of  equispaced  points  of 
distance  h,  which  tends  to  cover  the  whole  line  as  h  0,  and  define  a  Markov 
chain  on  by  the  following  non-zero  intensities:  for  x^G^ 


ah(x,x-h)  = -— »  a(x)  + -r  b  (x) 
2h 


a  ( x ,  x )  =  - 


2h 


~2  a(x)  “  ]b(x) 


(6.2) 


ah(x,x+h)  =  — ~  a(x)  +  b+(x) 

2h2  h 


except  for  the  first  and  the  last  point  of  the  grid,  which  are  made  absorbent. 
Let  nh  be  the  inclusion  of  G^  into  R,  and  let  P^  be  defined  as  in  Section  4. 
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It  is  clear  that  condition  (2.6)  is  satisfied  once  we  show  that  for  feC 
2 

sup  |i  a(x)  +  b(x)  y—  -  (-*—9-  a(x)  +rb  (x)  )f  (x-h)  +  ( - y  a(x)  +  -r-  |b(x)j  f(x)  + 

x£R  2  dxZ  X  2ti  n  2n 

-  (— -y  a(x)  +  -r  b+(x)  )f  (x+h)  |  <  6(h)  (6.3) 

2hZ 

where  6(h)  goes  to  zero  as  h  +  0.  The  behavior  at  the  boundary  is  controlled  by 

~2 

the  boundedness  assumptions  on  a  and  b  and  the  fact  that  feC  .  The  expression 
of  the  r.h.s.  of  (6.3)  can  be  rewritten  as 

-^yL  (f(x+h)  +  f(x-h)  -  2f(x)  -  f"(x)h2))  +-M~-  (f (x+h)  -  f(x)  -  f'(x)h)  + 

2h 

_  bjx)_  (f (x_h)  _  f(x)  _  f.(x)h) 
n 

which  clearly  shows  uniform  convergence  (f"  is  in  fact  uniformly  continuous). 

2 

Moreover,  for  any  g£C^,  the  boundedness  condition  (4.16)  is  verified,  and 
Corollary  4.1  and  Theorem  5.1  can  be  applied. 

Such  method  can  be  extended  to  the  case  Rd,  d  >  1,  with  additional  assump¬ 
tions  on  the  coefficients  U^].  The  verification  of  conditions  (2.7)  and  (4.9) 
is  still  straightforward.  It  is  clear  that  the  method  could  take  into  account 
boundary  conditions,  too. 

Example  2.  This  rather  artificial  example  serves  only  as  a  sample  to  show 
that  reasonable  alternatives  to  the  previous  space  discretization  scheme  exist, 
even  in  dimension  one.  Of  course,  this  is  much  more  true  in  higher  dimensions, 
given  that  the  complexity  of  the  topology  of  a  grid  increases.  Suppose  that 
a  =  1  in  (6.1),  and  write  b  =  -3V/3x.  Usually,  it  will  be  easier  to  compute  the 
"potential"  V  than  its  derivative  so  that  it  makes  sense  to  define  the  following 
approximating  chain,  holding  fixed  the  grid  as  before: 
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— 4  (4+  v(x)  -  v(x-h)),  if  V(x-h)  <  min{V(x)  ,  V(x+h)| 

h  Z 

— iy  otherwise, 

2h  (6.4) 

— 4  (4-  +  V(x)  -  V(x+h) )  ,  if  V(x+h)  <  min{v(x),  V(x-h)} 

h 

— iy  otherwise 
2n 

h 

letting  a“(x,x)  =  -(a“(x,x-h)  +  a  (x,x+h))  and  the  other  terras  to  be  zero 
(including  the  boundary  ones).  Condition  (2.6)  is  reduced  to  checking 

sup|— —  L(V(x)  -  V(x±h)  )f  (x^)  -  (V(x)  -  V(x-h))f(x)J  +4—  f  (x)  )  <  6(h) 
x  £R  ti  X 

•'2 

for  each  f £C  ,  where  6(h)  goes  to  zero  as  h  +  0.  This  is  because,  when  the 
V-terms  in  (6.4)  repeatedly  disappear  around  x,  as  h  +  0,  it  is  necessarily 

2 

(9V/<bc)(x)  =  0.  This  allows  to  prove  the  boundedness  condition  for  any  gEC^, 
so  the  convergence  property  of  the  filtering  algorithm  derived  from  (6.4)  is  the 
same  as  in  the  previous  example. 


a^1  (x,x-h) 


aT^XjX+h) 
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